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1 

Transmission of Digital Data Words Representing a Signal Wavafor. 

BACKGROUND TO T WE INVT^I^t om 
This invention relates tc 

transmitting, encoding and decoding data signals 



This invention relates to methods and apparatus for 
parts Of low significance of the digital words representing 



signal waveforms, particularly i„ applications . where th! 
degradation of the waveform signal resulting fron, the data 
codang is desired to be of minimal or benign effect. 

in many applications where a signal waveform is represented 
by a sequence of digital words, the accuracy of the digital 
word xs greater than is strictly required for a 
satisfactory representation of the original waveform, For 
example, m compact disc audio, audio signal waveforms are 
represented by 16-bit wordlength data sampled at 44 l kHz 
and by use of techniques such as dither and noiseshaping 
worir -ferences fa] - and Uei - , this 

wordlength is capable of producing a perceived dynamic 
range exceeding lio dB, whereas existing technologies and 
consumer requirements rarely require perceived dynamic 
ranges of more than 90 or 100 dB . Similarly, professional 
digital video standards often use 10 bit words to represent 
video signal waveforms, which has a significant quality 
margin above the quality found acceptable to most viewers 

in such situations, it may be desired to reallocate some of 
the information used to transmit the waveform to instead 
transmit and receive other data signals. For example, in 
compact disc audio, only two channels of stereo information 
are conveyed by the audio words, whereas systems of sound 
reproduction and recording using three or more related 
channels of audio information are found to be subjectively 
preferable to conventional two-channel stereo. it may 
therefore be desired to reallocate" some of the data in the 
compact disc audio words to transmitting additional audio 
channel signals. Many other applications, some of which 
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are detailed in section l below exist 

reallocated from the ex-ic^• ^^'^ "^^"^ '^ata 

exastxng wavefor. information to other 

S crair^le^hrera^ °^ - — - to .e 
specifically to taL aceor of ^^^^^^^^^ 

For example, it is undesirariV °' • 

reallocate audio word ra^r^L:: th^^ 

a markedly degraded sound when the disc is T T 
10 existing players . Played on 

Methods are already known in the prior art of r- 1 1 
waveform word data to other uses in >, . --allocating 
data produces a waveform error de igtd trie""""^' 
15 perceivable. For example, in ref fsT th. ^""^'"^.^^^ 
a .thod Whereby an au.io signal is 



20 



25 



30 



p":.I;:"tr;urT^::r i^ -^-antages. 

complicated on! regu!rrn;r ,^^^^^^^^^^^^ - ^ 

complexity. Secondly, there Is an inh ssing 
the Signal processing involved ^ spiTt^rnV'^^ 
subbands. Thirdly, the data rate that In b^ 
the sub-band method is reduced for . 1 n 

a low level . reduced for small input waveform to 

A particular disadvantaoe of th^ =,,k v, ^ 

■ ^ sub-band method is that ^^ 

relxes on models of auditory maski^rr 

^ -^^ory masking perceptually to hide 

the error caused by data coding in the sian.l . 
words, coding the data at levels within —form 
bands th.^ . , ^ ^^"^^^^ particular sub- 



35 



, , — - '-■^•^ vva.t,iiin part 

bands that are determined adaptivelv hv 
auditory masking to be masked by the a..^ T 
masking n,odels are still imperfect . ^ ^"'^ 

thresholds ... ... ' over, masking 



wo 94/18762 

PCT/GB94/00297 

3 

ears. Thus the waveform degradations produced by the sub- 
band method will in general produce audible effects that 
may not be acceptable for the highest quality uses. 

5 Another approach to transmitting data in an audio waveform 
for use with the NICAM system, has been described by Emmett 
122], in which the shape of the error spectrum is 
adaptively changed to be masked by the audio signal. The 
present proposal does not require the use of such level - 
10 adaptive data rates. 

A further disadvantage of the sub-band method is that in 
practice it is not efficient in an information-theoretic 
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sense , 



SUMMARY OF THE INVEWTTON 

The invention allows coding of data within the digital 
words representing signal waveforms such that the coding is 
efficient in the sense of information theory, thereby 
minimising added error noise levels, involves only short or 
zero time delays in the signal processing, and in which 
nonlinear distortion and data-related error variation 
effects are avoided, and also allows if desired avoidance 
of all modulation noise effects as well. The invention 
also allows the spectral characteristics of the error noise 
to be modified so as to minimise its perceptual level, 
which in general depends on the spectral characteristics.' 

These advantages of the invention permit data to be encoded 
within the words of signal waveforms at a higher data rate 
and with less waveform degradation than was possible in the 
prior art . 



According to the invention in a first aspect, there is 
provided a method of encoding digital data within digital 
words representing signal waveforms, including the step of 
modifying least significant digits of said digital words 
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representing signal waveforms in dependence upon said 
digital data, 

characterised by pseudo- randomising said digital data 
thereby forming data noise words having levels small 
5 relative to those of said waveform words, 

subtracting the pseudo- randomised data words from said 
waveform, words thereby producing a dithered, waveform word,, 
and 

quantizing said dithered waveform word and adding said 
10 data noise word to said quantized word thereby forming an 
output of reduced noise carrying information representing 
digital data in the least significant digits thereof. 

The method may be implemented using": 
15 means of receiving input digital waveform words 

representing input waveform signals, 

means of receiving input data information, 
means for outputting output digital waveform words 
representing an output waveform signal and 
20 incorporating data information, 

means for pseudo-randomising said data information 
and for forming, it into a word signal termed the data 
noise signal having a level or range of levels small 
relative to that of the waveform words, 
25 means for subtracting said data noise signal from 

digital words representing said input waveform 
producing dithered waveform words, 

means for uniformly quantizing said dithered 
waveform words, and 
30 means for adding said data noise signal to the 

output of said uniform quantizing means to produce 
output digital words, 
wherein least significant digits of the digital words 
representing signal waveforms are replaced in the output 
3 5 digital words by information representing said data 
information in a pseudo-randomised form. 
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The least significant digits of a digital word may be the 
digits in a binary representation of the word, or the least 
significant digits in representations of the word in any 
other integer base or bases. 

5 

In a preferred implementation of the invention in this 
aspect, there is provided noise . shaping, means around said 
uniform quantizing means adapted to modify the spectrum of 
the difference between output and input waveform signals in 
10 a desired predetermined manner. 

In one preferred implementation of the invention in its 
first aspect, said uniform quantizer means may be a uniform 
vector quantizer for a plurality n of signal channels in 
15 the sense defined below, and said data noise signal may be 
a vector noise signal in said plurality n of signal 
channels . 

In preferred implementations of the invention in its first 

2 0 aspect, the difference between output and input waveform 

signals has the form of a noise signal substantially free 
of nonlinear distortion products related to the input 
waveform signal, because the data noise signal has a 
probability distribution function adapted to subtract ively 
25 dither the uniform quantizer with substantially no 
resulting nonlinear distortion. 

Additionally preferred implementations of the invention in 
its first aspect provide for encoding data at a constant 

3 0 data rate and the difference between output and input 

waveform signals has the form of a noise signal 
substantially free of nonlinear distortion products related 
to the input waveform signal, and substantially free of 
variations in statistics dependent on the encoded data or 
35 on the input waveform signal. 
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According to the invention in a second aspect, there is 
provided a means for decoding data information encoded into 
the least significant digits of digital waveform words 
representing waveform signals, comprising 
5 means for receiving said digital waveform words, 

means of separating said least significant digits 
from said digital waveform words, 

means for inverting pseudo-random encoding in said 
least significant digits to provide data information, 
10 and 

means of outputting said data information. 

According to the invention in a third aspect, there is 
provided a system for encoding and decoding data 
15 information within the least significant digits of digital 
waveform words representing waveform signals, comprising 
encoding means according to the above first aspect, 
decoding means according to the above second aspect, 
and transmission means for conveying the output of said 

2 0 encoding means to the input of said decoding means. 

The said transmission means may, by way of example, be a 
wire or optical link, or a link using radio, acoustic or 
infra red waves, or may be via a storage medium such as 
25 memory storage media, hard disc media, magnetic tape or 
optical disc recording, storage and playback media or any 
sequential combination of these. 

When applied to audio CD (compact disc) , the invention 

3 0 provides a new method for burying a high data rate data 

channel (with up to 36 0 kbit/s or more) compatibly within 
the data stream of -an audio CD without significant 
impairment of existing CD performance. A proposal in this 
description is to replace a number (typically up to four 
35 per channel) of the least _signif icant bits (LSBs) of the 
audio words by other data, and to use the psychoacous t ic 
noise shaping techniques associated with noise shaped 
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subtractive dither to reduce the audibility of the 
resulting added noise down to a subjective perceived level 
equal to that of conventional CD. 

5 Simply replacing the LSBs of existing audio data would, of 
course cause a drastic audible modification of the existing 
audio signal for two reasons : 

1) the wordlength of existing signals would be truncated 
to (say) only 12 bits, which would not only reduce the 

10 basic quantization resolution by 24 dB, but also would 
introduce the problems of added distortion and modulation 
noise caused by truncation (e.g. see refs. [1-4]). 

2) Additionally, the replaced last (say) 4 LSBs would 
themselves constitute an added noise signal, which itself 

15 may not have a perceptually desirable random-noise like 
quality, and will also add to the perceived noise level in 
the main audio signal, typically increasing the noise by a 
further 3 dB above that due to truncation alone, giving in 
this case as much as 27 dB degradation total in noise 

2 0 performance . 

The invention incorporates methods of overcoming all these 
problems in replacing the last few LSBs of an audio signal 
by other data. The new method involves the following 

2 5 preferred steps: 

A) Using a pseudo-random encode/decode process, operating 
only on the LSB data stream itself without extra 
synchronizing signals, to make the added LSB data 
effectively of random noise forra, so that the added signal 

3 0 becomes truly noise- like. 

B) Using this pseudo-random data signal as a subtractive 
dither signal (e.g. see [1-4]), so that simultaneously it 
does not add to the perceived noise and that it removes all 
nonlinear distortion and modulation noise effects caused by 

35 truncation. Remarkably, and ~ unlike in the ordinary 
subtractive dither case [3] , this does not require the use 
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of a special eubtractive dither decoder, so that the 
process works on a standard off-the-shelf CD player, and 
C) preferably additionally, at the encoding stage, 
incorporating psychoacoustically optimized noise shaping of 
5 the (subtractive) truncation error, thereby reducing the 
perceived truncation noise error by around 17 dB further. 

The overall effect of combining these three processes is 
that if one incorporates data into the last few LSBs, then 

10 the effects of distortion, modulation noise and perceived 
audible patterns in the LSB data are completely removed, 
and the resulting perceived steady noise is reduced by 
around 2 3 dB below that of ordinary unshaped optimally 
dithered quantization to the same number of bits. For 

15 example, when the last 4 LSBs of the 16 bit CD wordlength 
is used for buried- channel data, the perceived S/N (signal- 
to-noise ratio) is around 91 dB - approximately the same as 
ordinary 16 bit CD quality when unshaped dither is used. 

20 The result of this process is that as much as 2 x 4 = 8 
bits of data per stereo sample is available for buried data 
without significant loss of audio quality on CD, giving a 
data rate of 8 x 44,1 = 352.8 kbit/s. 

25 While the new process achieves potentially high data rates 
for the buried channel, it does of course reduce room for 
improvements in CD audio quality approaching 20 bits 
effective audio quality, such as described in ref s . 
[3] , [4] . However, there is no reason why the process 

3 0 should only be used with one fixed number of LSBs, and by 
reducing the data rate of the buried channel to a smaller 
number of LSBs, one correspondingly improves the resolution 
of the audio - for example achieving an effective perceived 
S/N of around 103 dB for a system using 2 LSBs of data per 
35 signal channel sample, with a data rate still -of 176.4 
kbit/s. 
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One can even make the number of LSBs used fractional, say, 
^ or M or IM LSBs per sample. This may be used either to 
precisely match the buried channel to a desired data rate, 
or to minimize the loss of audio quality, especially at 
5 very low data rates . 

Additionally, by including in the LSB data channel itself 
low-rate data indicating the number of LSBs "stolen" from 
the main audio channels, it is possible to vary the number 

10 of LSBs stolen in a time-variant way, so that, for example, 
more LSBs can be taken by the buried channel when the 
resulting error is masked by a high-level main audio 
signal. The noise-shaping can also be varied adaptively at 
the encoding stage so that at high audio levels, the noise 

15 error is maximally masked^ by the audio signal, thereby 
increasing the data rate of the buried channel during loud 
passages to, in some cases, as much as 700 kbit/s. 

It is also shown in this description that with stereo 
20 signals, it is possible to code data jointly in the least 
significant parts of the audio words of the two (or more) 
channels, using a multichannel version of the data encoding 
process involving the use of uniform vector quantizers and 
subtractive vector dithering by a multichannel pseudo 
25 random data signal for the dithering. The basic theory of 
vector dithering is described in section 5. It is shown 
that the vector multichannel version of the data coding 
process ensures left/right symmetry of any added noise in 
the audio reproduction, and an advantageous noise 
3 0 perf orinance . 

The approach in this invention is substantially different 
from an alternative method of burying data described in 
[5] , which involved a process of splitting the audio signal 
3 5 into subbands, replacing the LSBs of the subbands with data 
based on auditory masking theory, and then reassembling the 
resulting data by recombining the subbands. Not only is 
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that process very complicated, with a considerable time- 
delay penalty in the subband encoding/decoding process, but 
it has to be done with extraordinary precision to prevent 
data errors in the band splitting and recombining process. 
5 By contrast, the present process involves little time 
delay, involves relatively simple signal processing, and 
further is such as to guarantee the lack of audible side- 
effects due to nonlinear distortion, modulation noise or 
data-related audible patterns, 

10 

1. Uses Of Buried Data 

1.1 Additional audio channels 

One application of a buried data channel particularly 
15 with an audio CD is to transmit alternative mixes of sound 
to that conveyed in the main channels. For example, a 
data-reduced audio signal may be conveyed using the buried 
data channel to convey an alternative sound mix of a piece 
of music particularly suited for special listening 
20 conditions, such as radio air play or use in exceptionally 
noisy environments such as in- car or background music use . 

A further extension of such uses is in Library music, 
where functional music for use as backgrounds in radio, 

25 film, advertising, multi-media, audio visual or 
television productions is put onto CD, . With existing 
library music, essentially only one mix can be conveyed on 
a track of a CD, but by incorporating in the buried data 
channels, additional data - compressed mixes or submixes in 

3 0 synchronism with the main channels, alternative mixes can 
be created by mixing together information from the main and 
buried data audio channels. 

Typically, by way of example, one might have three 
3 5 . basic stereo mixes A, B and _C which may for example 
convey the respective rhythm, harmony and melody lines of 
a piece of music. The main stereo channels may contain a 
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pre -determined mix a,A+b,B+c,C for mixing coefficients a^, 
b^, c, for general use, and the data compressed channels 
may convey two further mixes a^+bjB+CjC and a3A+b3B-fC3C 
for mixing coefficients aj, h^, Cj and 33, bj, C3. 
5 After data recovery and data compression decoding of the 
additional audio channels, any mix of the form 

d, (a^A-hb^B+c^C) + dj (a^+b2B+C2C) -t- dj (a3A+b3B + C3C) 

10 may be recovered to obtain any desired mix (a^+boB + CpC) of 
A, B and C. This technically may be done by putting them 
in the 3x3 matrix. 




multiplying the three signals a^A+b^B + c^C, a^A+bgB+CjC, 
ajA+bjB+CjC by its 3x3 matrix inverse M*'' to recover A, B 
20 and C and then to form whatever mix of these is required 
using conventional mixing methods. The encoding and 
decoding stages of such a proposal are shown schematically 
in Figure 16. 

25 This mix down method can be used also for consumer music 

releases where it is desired to give to the public the 
ability to produce modified mixes other than the standard 
mix of the main audio stereo channels. 

30 One application of this ability to provide alternative 

mixes is the possibility of providing a choice of languages 
for vocals in a music release aimed at a mult i-lingual 
market, with the main channels conveying one language, 
and the subsidiary channels conveying, for example the 

35 difference between the vocals in the first and in a second 
language. Subtracting this "difference" vocal channel 
from the main channels will produce a track with vocals in 
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the second language, while still retaining the full 
quality of the main channel for all the backing musical 
lines . 

5 1.2 Application to multichazmel aound 

One application of the new data channel is using the 
additional bits to add, using audio data compression, 
additional audio channels for three- or more-speaker 
frontal stereo or surround sound as shown in Figure 16, 

10 such as described for example in [6] , [7] , [8] . In using the 
buried channel to transmit additional directional audio 
channels, it is important to design the codec error signals 
so that they do not become audible through the mechanism of 
directional unmasking described in three of the inventor's 

15 references [9] , [10] , [11] . 

The data rate available is sufficient to transmit a Dolby 
AC-3 or MUSICAM surround 5-channel surround- sound signal, 
but these systems involve a quality compromise with the 
20 data rate, so that this is not a preferred procedure. 

High-cfLiality data compressed additional audio channels can, 
unlike existing data compression systems, minimize the risk 
of destruction of subtle auditory cues such as those for 

25 perceived distance, thereby maintaining CD digital audio as 
the preferred medium for high quality audio, while adding 
additional channels. For high quality (and especially 
musical) use, it may be preferred to use additional buried 
audio channels either for frontal-stage 3- or 4- speaker 

30 stereo or for 3 -channel horizontal or 4 -channel full -sphere 
with height [13] ambisonic surround sound (see ref s . 
[7] , [B] , [15] ) , rather than fox the rather cruder theatrical 
"surround-sound" effects considered appropriate for cinema 
or video- related surround- sound systems. However, systems 

35 have been proposed for intercompat ible use of both kinds of 
system [7] , [8] . 
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Since the main audio channels in this proposal convey high- 
quality audio, it is possible to use the spectral envelope 
of the main audio channels to convey most or all of the 
dynamic ranging information used for the subbands in data 
5 reduction systems for related subsidiary channels conveyed 
in the buried data channel, especially if the main audio 
channels incorporate a mixture of all the transmitted 
channels so that no direction is canceled out. This saves 
the data overhead of conveying ranging data, which in high 

10 quality systems may save of the order of 60kbit/s, as 
compared to a stand-alone data compression system. This 
will allow a system conveying n related channels using 4 
LSBs per main CD audio channel to give a performance 
equivalent to that of a stand-alone data compression system 

15 conveying n-2 channels in about 420 kbit/s. For 3-channel 
systems, such as horizontal B- format surround -sound or 3- 
channel UHJ [15] or frontal-stage 3-channel stereo, this 
quality is unlikely to be audibly distinguishable from an 
uncompressed data channel, and for 4 -channel systems, the 

20 results will still subjectively approach that of critical 
studio-quality material, and even for 5-channel material, 
the results will be considerably less compromised than that 
for DAB or cinema surround- sound, using a data rate for the 
additional channels of well over twice that used in those 

25 applications, 

1.3 Video and computer data 

Alternatively, the buried data channel can be used for 
conveying related computer data, such as graphics, data 
3 0 files, computer games, multilingual text or track copyright 
information and a data rate of 3 50 kbit/s is even enough to 
convey a reasonable video image by using a video data 
reduction system such as MPEG. 

3 5 1.4 Dynamic range data 

Another use would be to convey dynamic-range reduction or 
enhancement data, e.g. a channel conveying the setting of 
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a gain moment by moment. This would allow the same CD 
automatically to be played with different degrees of 
dynamic compression according to environment, by choosing 
the gain adjustment channel appropriate to that 
5 environment. This would include the possibility of 
completely uncompressed quality for high-cjuality use. It 
is known in the prior art to add buried data to a CD to 
control the switching on or off of a dynamic expander to 
alter the dynamic range of sounds, or to control the 
10 characteristics of such a dynamic expander. The present 
proposal differs in that the buried data channel is used to 
convey a signal representing the actual gain to which the 
audio waveform signal is to be subjected to moment by 
moment. By this means, the gain need not be rigidly 
15 specified by any particular design of compressor or 
expander, but may be chosen freely from that derived by 
many different kinds of compressors or expanders, or even 
derived from manual gain adjustment by an artistically 
skilled operative. 

20 

The gain signal may be conveyed by any known method. For 
example, it might by conveyed using say 12 successive bits 
in a data signal to convey using PCM the value of the gain 
control signal to 12 bit resolution with a bandwidth 

IS limited to the Nyguist resolution of the sample rate of the 
12 bit words. Preferably, the gain waveform will be coded 
in the data stream using Differential PCM techniqes rather 
than PCM techniques, since this will generally convey the 
gain control signal with a higher resolution at a given 

10 data rate. Well-known techniques of efficient data 
transmission such as Huffman coding may be used to maximise 
the gain control signal resolution within the available 
data rate. 

5 The decoder will recover the buried data as described in 

the following, will recover from this, by DPCM and Huffman 
decoding as appropriate, the original gain control signal, 
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and will then alter the gain of the main audio channels by 
multiplying these channels by the value of the gain control 
signal as shown schematically in Figure 17. 

5 Other continuous control signals can be conveyed 

digitally by the buried data channel in a similar manner, 
for . example by transmitting . the data in one or more 
channels of MIDI (Musical Instrument Digital Interface) 
control information. these MIDI control signals can be 
10 used to adjust the reporduciton parameters of the main 
audio channels by means of MIDI controlled gains, panpots 
and equalisers, reverberation units and similar effects 
devices in order to produce desired alterations for special 
reproduction purposes. 

15 

Such MIDI or similar control signals in the buried data 
channel can additionally or instead be used to cause the 
performance of MIDI -controlled synthesiser sound modules 
for the purposes of adding additional musical lines to 
2 0 those conveyed in the main audio waveform of the CD. 

1.5 Frequency Ramge Extension 

A further use related to the original audio is shown 
schematically in Figure 18 and is to add in the subchannel 
25 data-reduced information allowing information above 20 kHz 
to be reconstructed. It is widely noted that there is a 
significant loss of perceived quality cuased by the sharp 
bandlimiting to 20kH2 when comparing high-quality digital 
signals sampled at say 44.1 kHz as compared to 88.2 kHz. 

30 

From a qu-ility viewpoint, it may be more important to use 
an extended bandwidth to provide a more gentle roll -off 
rate than to provide a response flat to 40 kHz, since 
(unlike the brickwall filters used with ordinary CD) , such 
35 gentler roll-off s are . similar _ to those encountered in 
natural acoustical situations. 
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The extended bandwidth can be provided, for example, by 
using a high-order complementary mirror filter pair of the 
kind described in Regalia et al . [20] and in Crochiere and 
Rabiner [21] to split an 88.2 kHz-rate sampled digital 
5 signal into two bands sampled at 44,1 kHz. The filters 
involved will overlap, although using a high-order filter 
[2 01 , the„ region of significant overlap can be reduced to 
of the order of a kHz, Within the overlap region there will 
be aliasing from the other frequency range, although the 

10 reconstruction of the full bandwidth [20,21] will cancel 
out this aliasing. The band below 22.05 kHz can then be 
transmitted as the conventional audio, and the band above 
22,05 kHz can be transmitted in data reduced form in the 
buried data channel at a reduced data rate of, say, between 

15 1 and 4 bits per sample per channel, using known sub-band 
or predictive coding methods. This arrangement is 
illustrated in Figure 18. Phase compensation inverse to 
the phase response of the low pass filter in the 
complementary filter pair may be employed to linearise the 

20 phase response of the main sub-22.05 kHz signal for 
improved results for standard listeners, with the use of an 
inverse phase compensating filter in the decoding process 
for reconstructing the wider bandwidth signal, 

25 The potential quality problem caused by aliasing within the 
main audio waveform may be avoided by conveying a lower 
frequency range via a low pass filter that has 
substantially zero response above 22.05 kHz, and a higher 
frequency range in data-reduced form that includes some 

30 overlap of frequency range with the band below 22.05 kHz. 

1.6 Combined applications 

Any or all of these uses can, of course, be combined, 
subject only to the restrictions of the data rate, so that 
35 the buried data channel could be used for example to convey 
one additional audio channel, a dynamic range gain signal, 
extended bandwidth and additional graphics, text (possibly 
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in several languages) , copyright and even insert video data 
as appropriate. 

1.7 Other applications 

5 Although for simplicity of exposition, much of the 
description of the invention is discussed in detail in the 
case of digital words .representing audio signal waveforms 
as discrete time series and with respect to compact disc 
audio, it will be appreciated that the invention is not 

10 confined to this application, but may equally be applied to 
other cases such as video or image waveforms^ or waveforms 
representing analog data such as seismographic or 
electroencephalographic waveforms, or to digital words 
representing waveforms in a transfojrm domain such as a 

15 Fourier or cosine or sine transform or Hilbert transform 
domain, or the domains produced by the invert ible sub-band 
or discrete transforms used in waveform coding 
applications . 

2 0 Particularly in remote sensing applications where data 

waveforms have to be transmitted via a limited or expensive 
communications link, for example data sent from a space 
probe or satellite or a sensor in an oil well drill or a 
sensor used to gather meteorological information for use in 
25 weather forecasting, the invention may be used to convey 
other data in the least significant digits of the waveform 
data, while minimally affecting the noise performance in 
the waveform data. 

3 0 Other aspects, embodiments, objects, uses and advantages of 

the invention will be apparent from the description and 
claims. 

BRIEF DESCRIPTION OF THE DRAWINGS 
3 5 Embodiments of the invention will now be described by way 
of example, and the theoretical background to the invention 
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discussed, with reference to the accompanying drawings in 
which ; - 



Figure 1 shows pseudo random encoding and decoding of 
5 data transmitted via a digital channel to ensure noise-like 
behavior . 

Figure 2 shows a binary pseudor random . sequence, generator . 
using shift-register logic, with input "exclusive or" gate 
for encoding and decoding of a binary data stream. 
10 Figure 3 shows a schematic of processing of data to form 

an audio noise-like signal. 

Figure 4 shows subtractive dither around a uniform 
quantizer. 

Figure 5 shows subtractive dither using a combination of 
15 discrete and continuous RPDF dither. 

Figure 6 shows a noise shaped subtractively dithered 
uniform quantizer. 

Figure 7 shows an "outer" form equivalent to that of fig. 
6 for a noise-shaped subtractively dithered uniform 
20 quantizer, where H'(z'^) = H (z~^ ) / (1 (z'' ) ) . 

Figures 8a and 8b show noise shaping round pseudo random 
data noise signal encoding of data into an audio word using 
the standard noise shaper form and round a modified 
process . 

2 5 Figures 9a and 9b show noise shaping round pseudo random 

data noise signal encoding of data into an audio word using 
an "outer" noise shaper form equivalent to fig 8 if 
H'(z'") = H(z'V / (1-H(z'') ) , and round a modified process 
Figure 10 shows a further implementation of noise shaping 

3 0 round pseudo random data noise signal encoding of data into 

an audio word. 

Figure 11 shows the recovery of the data signal from the 
received coded audio word. 

Figure 12 shows the 2 -dimensional rhombic quantizer 
35 region (shaded square with sides_tilted 45°) shown against 
a background (squares with horizontal and vertical sides) 
of conventional independent quantizers (whose square 
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quantizer region is darkly shaded) on each channel y-, and 

Figure 13 shows the use of extra subtractive dither to 
eliminate nonlinear distortion and modulation noise at LSB 
5 level, using noise shaped triangular PDF dither having ±1 
LSB peaks to achieve good results in both nonsubtractive 
reproduction of output audio word and (shown) . subtractive 
reproduct ion . 

Figures 14 and 15 show the use of autodither to generate 
10 triangular dither in the encoder and audio decoder. 

Figure 16 shows the encoding of three or more audio 
channels as a pair of normal audio channels on CD with the 
remaining information conveyed using audio data compression 
15 in the least significant digits, and the recovery of these 
channels and their mixing or combining to form output audio 
channels, either for mixdown use or for roulti -channel 
directional sound reproduction. 

Figure 17 shows the encoding and the decoding as least 
20 significant bit buried data a signal intended for optional 
gain alteration of the reproduced CD sound. 

Figure 18 shows the encoding and the decoding of 
information coded in data-reduced form in the least 
significant bits for the increase of audio bandwidth beyond 

2 5 the 2 0 kHz limits of conventional CD. 

DESCRIPTION OF EXAMPLES 

A signal processing circuit for encoding data within a 
digitised signal waveform comprises a pseudo-random encoder 

3 0 1 and a uniform quantizer 2. The pseudo-random encoder 

applies a reversible pseudo-random function to the input 
data so that the output is noise-like. The output from the 
pseudo-random encoder 1 is substracted from an input 
digital word representing a signal waveform. After 
35 subtraction, the modified waveform word is quantized by the 
quantizer which in this example has a uniform quantization 



wo 94/18762 PCT/GB94/00297 

20 

characteristic. A noise-shaping loop is provided around 
the quantizer 2 and the data noise subtraction node. 



This circuit, and the alternative circuits dicsussed 
5 below, are conveniently implemented by means of signal 
processing algorithms programmed and implemented in ways 
well known, to those skilled in the art on. generaL purpose, 
digital signal processing chips, such as those in the 
Motorola DSP 56000 family such as the DSP56001 or DSP56002, 

10 or chips of the Texas Instrument TMS320 family, although 
any digital signal processing hardware capable of 
performing the required arithmetic and logic operations may 
be used, including programmable logic chips and arithmetic 
logic units and general purpose central processors used in 

15 computers. Logic algorithms for pseudo-random encoding and 
decoding of data, such as described below in connection 
with Figures 2a and 2b, have in themselves been well-known 
for over twenty years in the prior art, and may be 
implemented by using standard digital logic elements to 

20 implement Boolean logic operations. 

When used with general purpose digital signal processing 
chips, the data encoding algorithms are implemented as 
programs stored in program memory, operating on a time 

25 series of . digital input words representing waveform signals 
and on data signals, and providing a time series of digital 
output signal words representing modified waveform words 
incorporating data signal information. In a similar way 
when used with general purpose digital signal processing 

3 0 chips, the data decoding algorithms are implemented as 
programs stored in program memory, operating on a time 
series of digital input words representing waveform signals 
incorporating data signals, and providing an output data 
signal information . 

35 

The digital waveform signal being processed will usually 
have been derived either by passage through an analogue-to- 
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digital converter from analogue waveform signals, or from 
signals directly synthesised in the digital domain. The 
output waveform words may be converted to analogue 
waveforms by use of a digital to analogue converter. Where 
5 in this description the data signal is a data-reduced 
signal for analogue waveforms, such as data-compressed 
audio or image video signals, any available- hardware or 
software method may be used to encode the extra waveform 
information into a data-reduced form, and to decode the 

10 recovered buried data signal back into a waveform signal. 
Such hardware and software methods of encoding are 
available commercially for many data-reduction systems such 
as the MUSICAM, ISO/MPEG and Dolby AC2 and AC3 and APTX-100 
and ATRAC systems for data- reduced audio signals, and for 

15 the JPEG method of data reduction for image signals and the 
MPEG method of data reduction for video signals, 
2 . P 8 eudo- Random Coding of Dat:a 

2.1 Ps eudo -Randomized data 

20 It is desirable, if the LSBs of an audio signal are to be 
replaced by data, that the replacing data should truly 
resemble a random noise signal (albeit perhaps one that may 
be spectrally shaped for psychoacoustic reasons) . Most 
data signals, when listened to as though they were digital 

25 audio signals, have some degree of systematic pattern which 
may well prevent them from sounding or behaving truly like 
random noise. Such departures from random noise like 
behavior are generally much more perceptually disturbing or 
distracting than a simple steady noise. 

30 

Also, if we can ensure that added data behaves like a noise 
signal with known statistical properties, one can use all 
that is known in the literature on dither and noise shaping 
(see [1] - [4] , [16] - [19] ) to optimize the perceptual 
35 properties of the added data to minimize its audible 
effects . 
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The data signal is rendered pseudo- random with predictable 
statistics in our proposal by a data encode/decode process, 
the encode process having the effect of pseudo- randomizing 
the data signal, and the decode process having the effect 
5 of recovering the original data signal from the pseudo- 
randomized data signal, as in figure 1. From a practical 
point of view, it is highly desirable that the encode- and 
decode process require no use of an external synchronizing 
signal, but that the decode process should work entirely 
10 from the pseudo-randomized data sequence itself. 

The simplest way of constructing such an encode/decode 
pseudo- randomizing process for data is to use a cyclic 
pseudo- random logic sequence generator separately on each 

15 bit. For example, if its input is zero, fig. 2 shows a 
well-known binary pseudo-random logic sequence generator 
using feedback around three logic elements and a total 
shift register delay of 16 samples (a 1-sample delay is 
denoted by the usual notation z'^) . Provided that the logic 

20 state in the 16 samples stored in the shift register is not 
all zero, this binary sequence generator has the 16 logic 
states cycle through all 2^*-l = 65,535 non-zero states in 
a pseudo -random manner. 

25 If, instead of using a zero input, the pseudo- random 
sequence generator of fig. 2 is fed with a binary data 
stream s„, then it has the effect of a pseudo-randomizer 
for the input data. This encoding scheme is based on the 
recursive logic 

30 t„ = 8„ ® t„., ® Vj ® V„ ® V„ , (2-1) 

where t„ is the output binary logic value of the network at 
integer sample time n, is 'the input binary logic value 
of the network at integer sample time n, and © represents 
the logic "exclusive or" or Boolean addition operator (with 

3 5 tr^th table 0©0 = l©a = 0, 0-ffil = lffiO = l). 
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Conversely, if exactly the same arrangement of logic gates 
is fed with the pseudo-randomized data t„, then the effect 
of the "exclusive or" gates on the input signal is to 
restore the original data stream. This is achieved by the 
5 inverse decoding logic process 

S„ = t, ® t,., © t,.3 0 t„.,, © t,.,, . 
(2-2) 

illustrated in the second diagram in fig. 2. 

10 Thus by using a logic network recursively with a total of 
L = 16 samples delay and only 4 "exclusive or" gates, a 
binary data stream can be pseudo- randomized, and the same 
network can decode the data stream back to its original 
form. For constant signals, there is a one in 65,^36 

15 chance that the undesirable non-random zero state will be 
encountered, but this low probability is probably 
acceptable, given that even a single binary digit change of 
input is likely to "jog" the system back into a pseudo- 
random output state. 

20 

Other well-known pseudo- random binary sequence logic 
generators with shift registers of longer length L than 16 
samples can be used for encoding and decoding in the same 
way, with their fed-back output given by subjecting the 
25 delayed sequence output and the input to a "sum" logic 
gate. Such length L sequences will have, for a constant 
input, only one chance in 2*--l of giving an unrandomised 
output, and will have a sequence length of 2'--l samples. 

3 0 Although the pseudorandom binary sequence generator 
described in (2-1) and fig. 2 is a maximum length sequence 
for a zero input, it has a shorter length for an all -one 
constant input, and in general, the precise behavior with, 
say periodic inputs is hard to predict. Partly for this 

35 reason, it is not absolutely essential to use a maximum- 
length sequence generator, provided that the length of the 
sequence is not too short for constant inputs 
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It will be noted that the network of fig. 2 only has L = 16 
samples of memory, so that when used as a decoder, any data 
errors in the input will only propagate for L samples, and 
then the output will recover. This lack of long-term 
5 memory in the decoding process means that there are no 
special requirements on the error-rate of the transmission 
channel . Because of ^ the small number- of logic elements in 
fig. 2, a single sample error in the received data stream 
will only cause five sample errors in the decoded output , 

10 

As shown in fig. 3, typically, for use with CD, the data 
will first be arranged to form a number of bits of data per 
sample of each audio channel, for example 8 bits of data 
constituting bits 12 to 15 of the left and the right audio 
15 channels (where bit 0. is the most significant bit (MSB) of 
a 16 bit audio word and bit 15 the least significant bit) . 

Then each of these (say 8) bits will, separately, be 
encoded by a pseudo-random logic such as that of fig- 2 to 

20 form a pseudo random sequence, and the resulting pseudo- 
randomized bits used to replace the original bits in (say) 
bits 12 to 15 of the left and the right audio channels. 
The resulting noise signals in the left and right audio 
channels will be termed the (left and right) data noise 

25 signals. 

Alternatively, instead of pseudo -randomizing individual 
bits of the audio words representing data separately, they 
can be pseudo- randomized jointly by regarding the 

30 successive data bits of a word as being ordered 
sequentially in time, and applying a pseudo-random encoder 
such as that of figure 2 to 'this sequence of bits. For 
example, eight bits of data per audio sample can be 
sequentially ordered before the next eight bits of data 

35 corresponding to the next audio^ sample, and the pseudo 
random- logic encoding can be applied to this time series of 
bits at eight times the audio sampling rate. 
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An advantage of this strategy is that errors in received 
audio samples propagate for (in this example) for only one 
eighth of the time as in the case where each word bit is 
separately pseudo-randomized. 

5 

M-level data signals, taking one of M possible values, 
conveying log^M bits per sample can also' be pseudo- 
randomized by a direct process involving congruence 
techniques, whereby the coded version w'^ of the current 
10 sample M-level word w^ is given by 

^ ^j^n-j (^od M) , (2-3) 

where the aj's are (modulo M) integer coefficients chosen 
(if necessary by empirical trial-and-error) to ensure that 
all M possible constant inputs result in a pseudo- 
15 randomized output with reasonably long sequence lengths. 
The inverse decoding of the pseudo-randomized M-level words 
is 

L 

= ' y2 ^J^n-j (i^oci M) . (2-4) 

J=l 



The logic techniques described with reference to figure 2 
20 are just the special case when M = 2 of this more general 
congruence technicfue . The congruence technique can result 
in sequence lengths for constant inputs of length up to a 
maximum of m'"-1 samples, so that in general, the larger the 
value of M, the smaller need be L, with a consequent 
25 shortening of the time duration of propagation of errors. 

A slightly more complex pseudo-randomization of data will 
provide an initial pseudo- randomization of M-level data by 
a method such one of those described here, and follow it by 
30 an additional one-to-one map between the M possible data 
values. The decoding will first subject the M levels to an 
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inverse map before applying the inverse of the above 
pseudo-random encodings . 

There are many similar but more complicated methods of 
5 pseudo-randomization of data streams, and as we have seen, 
these need have no coding delay or increase in data rate 
after coding, and can limit the duration of any errors in 
received data in the inversely decoded output to not more 
than a few samples after the occurrence of an erroneous 
10 audio sample. 



As audio signals, the resulting pseudo-randomized data 
noise signals have a steady white noise spectrum and a 
(discrete) uniform or rectangular PDF (probability 
distribution function) , in- the example case described above 
having 16 levels in each of the left and right channels. 
Such discrete noise does not have the ideal properties of 
rectangular dither noise, although Wannamaker et a J [16] 
have shown that it approximates many of these desirable 
properties in a precise mathematical sense. However, 
adding to it an extra random or pseudo-random white 
rectangular PDF noise signal with peak level ± M LSB 
converts it into noise with a true rectangular PDF with 
peak levels (in this example) of ± 8 LSB. In this case the 
added noise to convert from a discrete to a continuous PDF 
is at a very low level, being 24 dB below the level of the 
data noise signal. 

2,2 Stereo parity coding 

Although in the above example, we have described data being 
conveyed on each audio word bit of the data signal 
separately, it will be realized that data can alternatively 
be conveyed by more, complicated combinations., of the ..least 
significant digits (in any numerical base M, not just the 
binary base 2) of audio words, for example on the Boolean 
sum of the corresponding bit in the left and right audio 
signal . 
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For example, consider the case that a data rate of only one 
bit per stereo audio sample is required. Such a signal can 
be conveyed as the Boolean sum of the LSB in the left and 
the right audio channels, leaving the values of the LSB in 
5 individual audio channels separately unconstrained. 
Conveying a data channel using the Boolean sum of the 
corresponding bits of the" left and right audio signals is 
herein termed stereo parity coding. 



LO It is of course desirable that the effect on the 
conventional audio of reallocating bits to a buried data 
channel should be left/right symmetrical. In particular, 
if a buried data channel is used with a data rate of just 
one BPSS (bit per stereo sample) , then one does not wish to 

.5 code the data in the LSBs of only one of the two stereo 
channels. If the value of the respective N'th bits of the 
respective left and right channel signals are denoted by Li^^^ 
and R ^ at time n, then one codes a pseudo- randomized one 
bit per sample data channel t"^ as 

0 t", = L*', e R^, . (2-5) 

If desired, an additional second pseudo-randomized one bit 
per sample data channel u**^ can be encoded in the N'th bits 
of the stereo audio signal say as 

^ n ^ ^ n • (2-6) 
5 in which case the data can be encoded via L**^ = u** R** = L** 

n n n n 

© t"^ . and decoded via u"^ = L*'^ , t*'^ = L*'„ © R*'„ . 
Alternatively u**^ can be encoded as R**^ . The use of stereo 
parity encoding allows the separate one BPSS data channels 
to be separately decoded while maintaining left/right 
0 symmetry in the audio when an odd number of one BPSS 
channels are used . 



One could standardize a basic one. BPSS data channel as 
being conveyed via the parity (Boolean sum) of the LSBs 
5 (i.e. bit 15) of the left and right audio channels. 
Information about the way other data channels conveying 
more BPSS are coded will, in such a standardization, be 
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conveyed by chis basic data channel. By this means, a 
data decoder can read from the basic one BPSS stereo parity 
data channel how to decode any other data channels (if any) 
present. In particular, this allows if desired moment-by 
5 moment variation of the data rate, either adaptively to the 
amount of data needing transmission or adaptively to the 
audio signal according to its' varying" ability to mask the 
error signal caused by the hidden data channels. 

10 For example, in loud passages in pop/rock music, the data 
rate allocated to say a video signal could be increased, 
allowing quite high quality video images in, say, heavy 
metal music;. 

2.3 Fractional bit rates - 

There is no reason why the buried data channels should be 
restricted to data rates of an integer number of BPSS, 
although this may be a convenient implementation. Several 
methods can be used to allocate less significant parts of 
audio words to data at fractional bit rates. 

One method conveys logjM bits for integer M in the less 
significant parts of audio words by conveying data in the 
M possible values of the remainder of the integer audio 
25 word after division by M, whereas the rounding quantization 
process used for the audio involves rounding to the nearest 
multiple of M. For M a power of 2, this reduces to 
conventional quantization to log^M fewer bits. 

30 In Eqs . (2-3) and (2-4) above, we described how such M- 
level data channels can be pseudo-randomized by pseudo- 
random congruence encoding and decoding . 

Alternatively, if M can be expressed as nontrivial product 
of K = two or more integer factors 



15 



20 
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K 

M = n 



then one can uniquely expand the M-level data word w in 
the form 



5 with w^^j an integer between 0 and Mj^^^-l. Eq. (2-7) is the 
generalization of the 

expansion of a number to base t% in the case Mj = for all 
j = 1,...,K. Each of the expansion coefficients w^j^j can, 
if desired, be separately pseudo- randomized before the 
10 final length M word is formed. Again, this generalizes the 
binary case described above where the Mj's equaled 2. 

A second method for fractional bit rates especially 
suitable for very low data rates of l/q BPSS for integer q 

15 is to code data only in one out of every audio samples. 
The encoding schemes are as before but with a data sampling 
rate divided by q, and decoding involves the decoder trying 
out and attempting to decode each of the q possible sub- 
sequences until it finds out (e.g. by confirming a parity 

20 check encoded into the data) which one carries data. 

For integers p < q, a data rate of p/q BPSS can similarly 
be obtained by encoding data in the LSBs of p out of every 
q samples (for example, samples 1 and 3 out of every 
25 successive 5 samples for p =-2 and q = 5) . 

■ A third method for fractional bit rates also codes data in 
the LSBs of q successive samples^ but codes the data into 
different logical combinations of all q bits. For example, 
30 a data rate of l/q BPSS can be obtained by encoding data as 
the parity (Boolean sum) of the q LSBs. It turns out that 
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this option is often capable of significantly less audio 
noise degradation than the simpler scheme of the second 
method. A part of the advantage is that if one needs to 
modify the parity, then one can choose to modify that 
5 sample out of the q successive samples causing the least 
error in an original high-resolution audio signal, rather 
than.- being . forced, to alter a., fixed, sample... 

We shall see in the following that, for all three kinds of 
10 fractional bit rate data encoding, it is possible to use a 
subtractive dithering technique by a data noise signal to 
eliminate unwanted modulation noise and distortion side 
effects on the modified waveform data. The advantages of 
the new process are not confined to integer bit rates per 
15 sample. 

3. Subtractively dithered noise shaping 
3 . 1 Subtractive dither 

Here we briefly review the ideas of subtractively dithered 
noise shaping, detailed by the inventors in refs. [1], [3] 
and [4] . In this description, by a "quantizer" we mean a 
signal rounding operation that takes higher resolution 
audio words and rounds them off to the nearest available 
level at a lower resolution. We assume that the quantizer 
is uniform, i.e. that the available quantization levels are 
evenly spaced, with a spacing or step size denoted as STEP. 

The quantizer rounding process introduces nonlinear 
distortion, but this distortion may be replaced by a benign 
white noise error at the same typical noise level by using 
the process of subtractive dither shown in figure 4. The 
process comprises adding a dither noise before the 
5 quantizer and subtracting the same dither noise afterwards. 
Provided that the statistics of the dither noise are 
suitable, it can be shown (see [1], [2]) that this results 
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in the elimination of all correlations between the error 
signal across the subtract ively dithered quantizer and the 
input signal. One such suitable dither statistics is what 
we term RPDF dither, i.e. dither each of whose samples is 
5 statistically independent of other samples and with a 
rectangular probability distribution function (PDF) with 
peak levels- .±M-. STEP. 

An audio word of B bits each of which is a pseudo-random 
10 binary sequence, is a 2^-level approximation to a signal 
with RPDF statistics, so that the data noise signals 
considered above may be used, as dither signals for 
dithering audio to eliminate nonlinear quantization 
distortions and modulation noise. Similarly, the M- level 
15 data noise signals described above in section 2.3 using the 
remainder modulo M for data, if made to be of a pseudo- 
random form by a pseudo- random data encoding/decoding 
process, can be used as an M- level approximation to RPDF 



20 



30 



noise , 



Although data noise signals are discrete approximations to 
RPDF noise, they can be converted to continuous RPDF noise 
statistics by the simple process of adding to them an 
additional smaller RPDF noise with peak levels ±M LSB, 
25 where LSB is the step size of the LSB's of the transmitted 
audio words (as distinct from the step size STEP = M LSBs 
of any rounding process used in encoding hidden data 
channels.) This is shown schematically in figure 5. 



conventionally, as described in refs. [1] and [3], the use 
of subtractive dither requires the use of a decoding 
process in which during playback, the original dither noise 
added before the quantizer is reconstructed before being 
subtracted; this requires either the use of synchronized 
35 pseudo-random dither .generation algorithms, or an 
encode/decode process in which the dither noise is 
generated from the LSB's of previous samples of the audio 
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signal [3] . However, in the application of this invention, 
as will be seen, no special dither reconstruction process 
is required for the discrete dither, since this is already 
present in the transmitted LSBs. 

5 

3,2 Noise shaping 

A- white error spectrum .. i s -. not. subjectively optimum , for. 
audio signals, where it is preferred to weight the error 
spectrum to match the ears' sensitivity to different 
10 frequencies so as to minimize the audibility or perceptual 
nuisance of the error. The spectrum of the error signal 
may be modified to match any desired psychoacoust ic 
criteria by the process of noise shaping, discussed for 
example in ref s . [1], [4], [12], [17] - [19] . 

15 

Noise shaping may be static (i.e. adjusting the spectrum in 
a time-invariant way) and made to minimize audibility or 
optimize perceptual quality at low noise levels, or 
alternatively it can be made adaptive to the audio signal 

20 spectrum so as to be optimally masked by the instantaneous 
masking thresholds of audio signals at a higher level. The 
latter option is particularly valuable in the present 
application, where loud audio signals may well allow an 
increased error energy to be masked, thereby allowing a 

25 higher data rate to be transmitted in the hidden data 
channels during loud audio passages. 

The form of noise shaping with subtractive dither that may 
be used in this description is indicated in the schematic 

30 of figure 6. It will be noted that, while it is equivalent 
to some of the forms described in ref. [1] , it is not the 
arrangement described previously by the inventors in ref. 
[3] , in that here we put the noise shaping loop around the 
whole subtractive process. With the arrangement of figure 

35 6, the output of the quantizer, itself differs from the 
noise shaped output of the whole system by a spectrally 
white dither noise, so that in this arrangement, unlike 
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those suggested in ref. [3], the spectral shape of the 
quantizer output error and system output error is not 
identical . 

5 With the noise-shaped subtractively dithered quantizer of 
fig. 6, the error feedback filter H(z'^) must include a 1- 
sample delay factor -.z'^ in. order to be. implement able 
recursively, and the originally white spectrum of the 
subtractively dithered quantizer is filtered by the 

10 frequency response of the noise shaping filter 

1 - H(z') , (3-1) 
which is preferably chosen to be minimum phase to minimize 
noise energy for a given spectral shape [1] , and may be 
chosen to be of any desired spectral shape. 

15 . , ' 

Other implementations of noise shaping around a dithered 
quantizer system are possible. Alternative implementations 
are reviewed in ref. [4] . By way of example, fig. 7 shows 
an alternative "outer" form of noise shaping architecture 

20 described in ref. [4], that is equivalent to fig. 6 if one 
puts 

H'(z'') ^ H(z'')/(1-H(z'') ) . 
(3-2) 

25 The application of noise shaping around a subtractively 
dithered quantizer will not result in any unwanted 
nonlinear distortion or modulation noise, provided that the 
dither noise added in figs. € or 7 is RPDF dither matched 
to the step size STEP of the quantizer. 

30 

4. Application to buried data channels 

4.1 Koise-sbaped Bubtrac tively dithered buried cbannel 
encoding 

35 Either the arrangement of fig. 6 "or fig. 7 can be applied 
to obtain subtractively dithered noise-shaped audio results 
when the last digits of an audio signal word (whether the 
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last N binary digits or the remainder after division by M) 
are replaced by buried data bits. 

The procedure is now simple to describe. First the data is 
5 pseudo randomized, and then used to form a data noise 
signal as described above. This data noise signal has 
(discrete -M-level ).. RPDF statistics and ,.may. be used as the 
dither noise source in figures 6 or 7, as shown in figs 8 
and 9. where the quantizer is simply the process of 
10 rounding the signal word to the nearest integer multiple of 
M LSB's (or the nearest level if the levels are placed 
uniformly at other than the integer multiple of M LSB's) . 
The process shown in figures 8 or 9 subtracts the data 
noise signal from the audio at the input of the uniform 
15 quantizer (which has step size STEP = M LSBs) , and adds it 
back again at the output of the quantizer so as to make the 
least significant digits of the output audio word equal to 
the data noise signal. Noise shaping is performed around 
this whole process. 



0 



For best results using the algorithms of figs, 8 or 9 (or 
equivalent algorithms such as that in figure 10 below) , it 
is best if the input audio word signal is available at a 
higher resolution or wordlength than that used in the 
output, since this will avoid cascading the rounding 
process used in figs. 8 or 9 with another earlier rounding 
process. By making the input signal available at the 
highest possible resolution, any overall degradation of 
signal-to-noise ratio is minimized. 

Since the output equals the output of the quantizer plus 
the data noise signal, the noise shaping has no effect on 
the information representing the data in the output audio 
word, but merely modifies the process by which the 
quantization of the audio is performed so as to minimize 
the perceptual effect of the added data noise on the audio. 
It is remarkable that this output signal, being the output 
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of a noise-shaped subtract ively dithered quantizer, 
automatically incorporates all the benefits of noise shaped 
subtractive dither without the audio-only listener needing 
any special subtractive decoding apparatus, 

5 

Moreover, because the information received by the data- 
channel. user is .not, dependent, on the noise shaping process, 
the noise shaping can be varied in any way desired without 
affecting reception of the data (provided only that no 

10 overflow occurs in the noise shaping loop near peak audio 
levels - fitting a clipper in the signal path before the 
quantizer to prevent this may be desirable) . Thus the 
noise-shaping process does not affect the way the signal is 
used by either audio or data end-users of the signal, and 

15 so does not need any standardization, but may be used in 
any way desired by the encoding operative to achieve any 
desired kind of static or dynamic noise shaping 
characteristic . 

20 Other equivalent noise-shaped dithering architectures may 
be used in place of those shown in figs. 8 and 9 for 
encoding the data signals into the output audio word, using 
the kind of equivalent architectures discussed in ref , [4] . 
Purely by way of example, fig. 10 shows yet another 

25 implementation having identical performance to that shown 
in figs. 8 or 9 . It is also evident that in a similar way, 
the data noise signal can be added and subtracted outside 
the "outer" noise shaper of fig. 9 rather than inside the 
noise shaper as shown . 

30 

Although noise shaping is preferably used for systems of 
adding buried data according to the methods of subtractive 
dither by the buried data as described above with reference 
to figs. 8 to 10, it may also be applied to those systems 
35. in which the buried data is not subtracted before the 
quantiser but only added after the quantizer, for example 
as in figs. 8b and 9b, where the subtraction node of 
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Figures 8a and 9a immediately before the quantizer is 
omitted, or where the signal fed to this node is 
conventional additive pseudo-random dither noise rather 
than the psuedo-randomised data signal. Omission of the 
5 subtracted data noise signal or its substitution by 
conventional dither noise at the node before the quantizer 
typically loses some- of-, the quality, advantages., of. the., 
preferred process for burying data, typically increasing 
reproduced noise levels in the modified digital word by 

.0 6dB. Nevertheless, if such a procedure is adopted, its 
subjective performance will be maximised if a noise shaping 
such as illustrated in Figures 8 or 9 is used around the 
process in which pseudo-random dither is added before the 
quantizer and the data noise signal is added only after the 

5 cfuantizer. 

Explicit coefficients for the noiseshaper filter H(z'^) 
that may be used to reduce the audibility of buried data on 
compact disc and other audio media at sampling rates of 
0 44.1 or 46 kHz, with or without audio pre-emphasis are 
described in reference 12. 

It will of course be realised that subtracting a data 
noise signal before a quantizer and adding it after the 

5 quantizer is equivalent to adding a polarity-reversed data 
noise signal before the quantizer and subtracting it after 
the quantizer. Since a polarity reversed data noise signal 
conveys exactly the same data information as the original 
data noise signal, it will be seen that all descriptions 

) of the invention are equivalent to the case where a data 
noise signal is added before the quantizer and subtracted 
after it. Which realisation- is adopted in practice is 
purely a question of convenience. 

4.2 Buried Channel Decoding 

Optimum recovery of the audio channels involves no need for 
any kind of decoder in this proposal. Playback is 
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conventional, with the effect of subtractive dither by the 
data noise signal being automatic as described above. 



Recovery of the buried data is also straightforward, simply 
5 being recovery of the data noise signal by rejecting 
highest bits of the received audio word, or in the case of 
M- level, data,, the. . inverse process to the encoding, of 
reading the remainder of the audio word after division by 
M, i.e. resolving the least significant digits of the audio 
10 word via modulo M arithmetic. This is followed by the 
inverse pseudo-random decoding process to recover the data 
before pseudo randomization, and then the data is handled 
as data in the usual way. This decoding process is shown 
schematically in figure 11. 

15 

In the case that the data is encoded as integer 
coefficients w^^^ with more than one base Mj as in Eq. (2-7) 
above, the data is recovered by K successive divisions by 
to M^, at each stage discarding the fractional part, the 
20 K coefficients w^^^ being the integer remainders of the 
division by This is the same process shown in figure 

11, but with K stages of the modulo division. 

5. Vector quantization and dither 

25 

5.1 Reasons for digression 

The above descriptions of the use of noise shaped 
subtractive dithering also apply to the stereo parity 
coding case as well. To see this, we need first to look at 
30 vector cpjantization and vector dithering, and show that 
exactly the same ideas for subtractive dithering, noise 
shaping and data encoding ca^h be applied to the vector 
quantizer case as the scalar case described above. 



35 



The description here is given in" greater generality than 
needed just for the stereo parity coding case, since it has 
applications to coding information in the parity of the 
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corresponding bits in 3 or more channels in transmission 
media carrying more than two audio or image channels, for 
example in the 3 channels containing the 3 components of a 
color image . 

5.2 XTniform vector quantizers 

As briefly indicated... in... earlier . papers. [1],,.. [3l,„ [91, the.. 

concepts of additive and subtractive dither can be applied 
to vector as well as scalar quantizers. Vector quantizers 
quantize a vector signal y comprising n scalar signals 
(yj/---/y^) in geometrical regions covering the n- 
dimensional space of n real variables. As in the scalar 
case, we shall say that a vector quantizer Q is a uniform 
quantizer if the signal y is quantized to a point of a 
15 discrete grid G of quantization vectors [y^ : g e G} , where 
there exists a region C around (0,..,,o) of n-dimensional 
space such that the regions + C = {y^ + c : c e C} cover 
without overlap (except at their boundary surfaces) the 
range of signal variables y being quantized. Thus a 
uniform vector quantizer divides the n-variable space into 
a grid of identical vector quantization cells that are 
translates of the cell C to the points of the grid G, and 
quantizes or rounds any point in the cell + C to the 

point . 



20 



15 



There are many examples of uniform vector quantizers, the 
simplest of which has a hypercubic cell C = the region 

{(c,, c„) : |cj ^ M STEP V i = 1 n}, i.e. 

separate scalar quantization of the n variables. The grid 

G in this case is simply points of the form 
(m^STEP,m2STEP, , . . ,m„STEP) for integer m^'s, and the 

associated vector quantizer is simply that that takes 
<yi' — wy^) to mj - integer (y./STEP) for j = 1 n. This 

case is trivial in the sense that it is equivalent to using 

separate uniform scalar quantizers on each of the n 

channels . 
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A more complicated but easily visualised example is the 2- 
channel case where C is a regular hexagon in the plane, for 
example the region consisting of the points (c,,C2) in the 
plane such that 

5 |cj s M STEP, I-MC^ + (J3/2)C2| * ^ STEP, | -Mc, - 

(J3/2)C2| s ^ STEP, (5-1) 
and .the grid - G is. the. centers of the hexagons in the 
honeycomb grid covering the plane, i.e. G is the points 
( (m,4Mm2) STEP, (j3/2)m2STEP) (5-2) 
10 for integer and mj. 

A uniform vector quantizer of particular interest and 
practical use in n dimensions is what we shall term the 
rhombic quantizer. This starts off with a conventional 

15 hypercubic grid Gj. of points at positions 
(m,STEP,m2STEP, . , . ,m^STEP) , where STEP is a step size, and 
to m^ are integers, which of course has the hypercube 
quantizer cell described just above and corresponds to the 
use of n separate scalar uniform quantizers. However, we 

20 the produce a new grid G C G^ which consists of just those 
grid points in with m-,+ . . . +m^ having even integer values. 
This new grid only has half as many points as the original, 
and can be equipped with a new vector quantization cell C 
as follows, which we shall term the n-dimensional rhombic 

25 quantizer cell. 

The rhombic quantizer cell can be described geometrically 
by thinking of the original hypercubic cells as being 
colored white if m^+.-.+m^ is even and black if m^+...-t-m^ 

30 is odd, forming a kind of n-dimensional checkerboard 
pattern of alternately black and white hypercubes. Then 
attach to each white hypercube that "pyramid" portion of 
each adjacent black hypercube lying between the center of 
the black hypercube and the common "face" with the white 

35 hypercube. The resulting- solid is the rhombic cell C. 
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It is evident, since the pyramid portions taken from 
adjacent black hypercubes are in total enough to form one 
black hypercube if pieced together, that the volume 
occupied by the rhombic quantizer cell is twice that 
5 occupied by the original hypercube quantizer cell, and that 
the versions of the rhombic quantizer cell translated by 
the --grid G. indeed , cover the. n-dimensional.. n-parameter 
vector signal space. 

For n = 2, the rhombic quantizer cell C is a diamond- shape , 
being a square whose sides are rotated 45° relative to the 
channel axes, as shown in fig. 12. For n = 3, the rhombic 
quantizer cell C is a rhombidodecahedron, a 12-faced solid 
whose faces are rhombuses. For n = 4, the rhombic 
quantizer cell C is a regular polytope unique to 4 
dimensions termed the regular 24-hedroid . 

Calculations involving quite complicated multidimensional 
integrals, which we shall not detail here, show, for a 
given large number of quantizer cells covering a large 
region of n-dimensional space, that for n = 2, rhombic 
quantization has the same signal-to-noise ratio (S/N) as 
conventional independent quantization of the channels, but 
that for n^3., rhombic quantizers give a better S/N than 
conventional independent quantization of the channels. The 
improvement reaches a maximum of about 0.4 3 dB when n = 6. 
This improvement in the S/N is maintained when subtractive 
dither is used as described below. The hexagonal 2-channel 
quantization described above gives a 0.16 dB better S/N 
than independent quantization of 2 channels. 

Mathematically, the rhombic quantizer has grid G consisting 
of the points 

(m^STEP,m2STEP, . . , ,m„STEP) , {5-3a) 
35 where the m^ have integer- values with 
'm^+.-.+m^ having even integer values. 
(5-3b) 



10 



15 



25 
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The rhombic cell C is that region of points (c^, . . . , c^) 
satisfying the n(n-l) inequalities 

|c^ + Cj| £ STEP, |Ci-Cj| 5 STEP, 
(5-4) 

5 for i 3 selected from l,...,n. The associated uniform 
vector quantizer rounds a vector signal (y^—'/yn) 
algorithm- whose outline form, might. be. 
m'i := integer (y^STEP) , 
If m'^+,..+m'„ is even 
10 then m^ := m'^ for all i = l,...,n, 

else Cj := y^/STEP - m',-, 

{*) dj : = sgn(Cj) if | Cj | > | c J for all i < j and 

jcj ^ |cj for all i > j 

d^ := 0 for all other i, 
15 m,. := m' J + d^ for all i = 1, . . . ,n. 

End If (5-5) 
The function x integer (x) , where, e.g., x=y/STEP in 
the above formula, here is that "rounding" function that 
takes a number x to the nearest integer value, i.e. which 
20 takes x to the integer part of x 1/2 by discarding the 
fractional part of x + 1/2. 

There are, of course, various equivalent forms for this 
kind of rhombic quantizer algorithm, a computationally 
2 5 demanding aspect on typical signal processors being the 
determination in line (*) of that j for which i Cj ] is 
biggest . 

In the n = 2 case, there is a simpler rhombic quantization 
30 algorithm as follows 

X, := yi+yj. ^2 •= y^-y^ ' 

m', := integer (x,/( (J 2) STEP)') 
m'2 := integer (Xj/C (4 2) STEP) ) 
m, := m'^ + m'j/ "^2 * "^'1 ' ^'2' 
35 (5-6) 

which is based on the observation that the rhombic 
quantizer cell for n = 2 is the same shape as the sqiiare 
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cell used for ordinary independent quantization of the two 
channels, but rotated by 4 5° and with an increase of the 
step size by a factor a/2 . (See fig. 12). 

5 5.3 Subtractive vector dither 

The concepts of dithering for uniform quantizers developed 
in refs. [1-4] for scalar uniform quantizers may be. applied 
also to the vector case by using appropriate vector 
dithers. An n-signal dither noise vector (n,, . -/n^) is said 

10 to have a uniform probability distribution function in a 
region C of n-dimensional space if its joint probability 
distribution function is constant within the region C and 
zero outside it. This is the n-dimensional generalization 
of rectangular PDF dither for vector signals, and we denote 

15 the associated n-vector dither signal by r^. 

It can be shown (we omit any proofs here) that if the 
subtractive dither arrangement of figure 4 is used for 
modifying an input vector signal, where the "uniform 

20 quantizer" becomes a vector uniform quantizer with 
quantization cell C, and the dither noise becomes a uniform 
PDF vector dither r^. on the region C, then the output 
vector signal of the system is free of all nonlinear 
distortion and modulation noise effects i.e. the first 

25 moment of the output signal error is zero, and the second 
moment independent of the input signal [4] . Moreover, this 
is still the case if any statistically independent 
additional noise is added to the uniform PDF dither noise 
on the region C. 

30 

Moreover, noise shaping can be applied around such 
subtractive dither in exactly the same way as before, as 
shown in figs. 6 and 7, or in equivalent noise shaping 
architectures, the only difference being that any filtering 
35 is now applied to n parallel signal channels. It is also 
possible, if desired, to use an n x n matrix error feedback 
filter Hlz'^) or H' (z'^) to make the noise shaping dependent 
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on the vector direction, for example to optimize 
directional masking of noise by signals [9] , [10] . 



It is possible to generate uniform PDF vector dither 
5 over the rhombic cell C described above by an algorithm 
such as the following: First generate, for example by the 
well-known, congruence . method, n statistically independent 
rectangular PDF dither signals r,- (i = l,,..,n) with peak 
values ±M STEP, and also generate an additional two-valued 
10 random or pseudorandom signal u with a value of either 0 or 
1. Then the values of the noise signal = (v-,,..,v^) are 
given by: 
If u = 0 

then V- := rj for all i = l,,..,n, 
15 else dj : = sgn (rj) if j r j | > | r ■ | for all i < j and 

|rj| a: |rj for all i > j 

d^ := 0 for all other i, 
n^ := r,. - d,-STEP for all i = l,...,n. 
End If. (5-7) 

20 

However, in applications of subtractive dither, this 
algorithm may involve unnecessary complication, since it 
can be shown that with the subtractive dither arrangement 
of fig. 4 with a uniform vector quantizer with quantization 

25 cell C, that a uniform PDF vector dither signal r^^ may be 
used for amy other uniform quantization cell D sharing the 
same grid G, and will still eliminate nonlinear distortion 
and modulation noise in the output. Whatever the shape of 
the other quantization cell D used for the dither signal, 

3 0 the resulting error signal from the subtractive dither 
arrangement of fig. 4 is a noise signal with uniform PDF 
statistics on the quantizer cell C of the uniform vector 
quantizer used. 



3 5 This can allow a much simpler algorithm to be used for 
generating the vector dither in which uSTEP is added to (or 
subtracted from) just one of the n rectang\ilar PDF noise 
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components. For example, a uniform PDF vector dither noise 
signal ^ (v^, . . ,v^) given by 

v^ := - uSTEP 

V. : = r^ for i = 2, , . , , n. 
5 (5-8) 

may be used to subtractively dither the above rhombic 
cjuantizer . 



Refinements of the basic proposal, 

6.1 Further developments 

. The encoding process described above will work well as it 
stands, but does not incorporate various desirable 
15 refinements which we shall now describe. These include 
methods to take account of the fact that the data noise 
signal has a discrete and not a continuous PDF dither, and 
applications involving stereo parity coding. 

2 0 6.2 Kon-discrete dither 

The fact that the dither given by the data noise signal has 
an M- level discrete probability distribution function 
rather than a continuous RPDF means that there is still 
unwanted quantization distortion at the level of the LSB of 

25 the audio word which is not properly dithered. Preferred 
methods of adding "non-discrete" dither or, strictly 
speaking, dither at a significantly high arithmetic 
accuracy such as implemented using 24 or 32 bit arithmetic 
are now described. The method of adding such dither shown 

3 0 in fig. 5 is not preferred for three reasons: 

(i) Optimum playback requires subtractive decoding of the 
±M LSB RPDF dither signal, wixh all the usual problems of 
implementing subtractive dither [1] , since unlike the 
discrete data noise signal, this is not explicitly 

35 transmitted in the audio - word. 

(ii) the ±M LSB RPDF dither signal added before the 
quantizer does not eliminate modulation noise in non- 
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subtractive playback, having the wrong statistics for this 
purpose [2] , and 

(iii) if the whole system is noise shaped as in figs. 6 
or 7, the nonsubtractive listener will hear the ±M LSB RPDF 
5 dither signal as having a white spectrum not affected by 
the noise shaping, so will perceive an increase in noise 
level . 

A correct way of adding extra dither to avoid nonlinear 
10 quantization distortion and modulation noise at the ±M LSB 
level is shown in figure 13. The dither used has a 
triangular PDF with peak levels ±1 LSB (so-called TPDF 
dither) with independent statistics at each discrete time 
instant, so as to eliminate modulation noise in 
15 nonsubtractive playback [2] , and is added before the 
quantizer in the noise shaping loop, but not subtracted in 
the noise shaping loop. This ensures that the added noise 
in nonsubtractive playback is noise shaped . 

20 Subtractive playback of the extra dither is done, also as 
shown in fig. 13 by reconstituting the triangular ±1 LSB 
PDF dither at the playback stage, passing it through a 
noise shaping filter 1 - H(z'^) , and subtracting the 
filtered noise from the output audio word, Subtractive 

25 playback of course reduces the extra noise energy caused by 
the non-discrete dither by a factor 3, although this will 
only be highly advantageous when the data noise signal has 
fairly low energy, e.g. at a data rate of 1 BPSS . 

3 0 The triangular dither signal may be generated, in encoding, 
as proposed in the "autodither" proposal of ref . [3] by 
means of a pseudo-random logic look-up table or a logic 
network having the effect of a pseudo-random look-up table, 
from the less or least significant parts of the output 

35. audio word in the last K previous -samples , where typically 
K may be 24, and can be reconstructed from the same audio 
word at the input of the system by the same look-up table 
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or logic in the decoding stage. This is shown in the 
system of fig. 13 in fig. 14. 

Although figures 13 and 14 are shown for the particular 
noise shaping architecture of fig. 6, similar ways of 
adding the extra triangular dither can be used with any 
other equivalent noise shaping architecture . such as the 
outer form of figure 7 and fig. 10, again by adding the 
triangular dither just before the quantizer and subtracting 
it, via a noise shaping filter 1 - H(z'^), only at the 
output of the decoder. . It is clear that the points at 
which dither signals are added can be shifted around in 
various ways without affecting the functionality of the 
system. 

A disadvantage of the methods for adding ±1 LSB triangular 
PDF dither shown in figs . 13 and 14 is that in these 
scliematics, the noise shaping filter l'H(z'^) used for the 
triangular PDF dither and for the quantizer is identical. 
Especially in systems of subtractive dither where the noise 
shaping of the subtracted dither in the decoder is 
desirably standardised (see ref. [3]), this would not allow 
use of noise shaping around the quantizer with a non- 
standardised characteristic, such as for example noise 
shaping adaptive to the signal waveform level and spectrum 
to take advantage of auditory masking by the signal. 

An alternative shown in fig. 15 avoiding this disadvantage 
uses a first possibly fixed or standardised filter l'H^(z'^) 
for the ±1 liSB triangular PDF dither noise subtractive 
decoding, but now uses the same filter in the encoding, and 
instead adds this filtered ±X' LSB triangular PDF dither 
noise to a point before the noise shaping loop. The noise 
shaping loop around the quantizer may then use a second 
possibly different error feedback filter H^(z'^) 'in place 
of H(z'^) to achieve any desired predetermined quantizer 
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noise shaping characteristic l'H^(z'^), including ones 
adaptive to the signal waveform. 

€.3 The Btereo parity case 

5 Suppose we have 2 -channel stereo signals in which data is 
encoded pseudorandomly in bit N. for all N = 15 to say 15- 
h+1 (where, the. integer h may typically be any. integer from 
say 0 to perhaps 6 or 8, the case 0 being the case of no 
bits being encoded) of the left and right audio words, and 
10 data also being encoded in the stereo parity (Boolean sum) 
of bit 15-h of the left and right audio words, as described 
in subsection 2.2 above. 

Based on the results on uniform vector quantization and 
15 subtractive vector dither of section 5 above, the noise- 
shaped subtractive encoding of the data described above in 
the scalar case for individual audio channels may be 
applied to this case too with just two reinterpretations of 
the above : 

20 (i)The uniform quantizer used in figs. 6-10 now becomes 

a uniform 2 -dimensional rhombic quantizer (such as 
described in Algorithm (5-6) and illustrated in fig. 
12) with STEP = 2^ LSB . 

(ii) the "data noise signal" used for dithering is 
25 given, for example, by Eqs . (5-8) where r,- is the data 

noise signal of the last h bits of the i'th-channel 
audio word (with the first channel being say left and 
the 2nd channel being say right) , and u being the 
parity of bits 15-h of the left and right audio words. 
3 0 In units of LSB, the data noise signal for the left 

channel is then Lq - 2**u and for the right channel is 
Rq, where Lq and R^j are the respective integer words 
represented by the last h bits of the audio word 
formed by the data in the two channels, 

35 

Any alternative data noise signal may be used that 
represents an appropriate uniform PDF vector dither as 
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described in section 5.3, such as for example that given by 
Algorithm (5-5) . 

The residual nonlinear distortion and modulation noise 
effects at the LSB level caused by the fact that the vector 
data noise is discrete rather than continuous can be 
removed.. by using exactly, the same technique described in 
subsection 6.2 and figs. 13 and 14 above by adding and, 
where appropriate, subtracting ±1LSB triangular PDF dither 
in each channel separately, the only difference being that 
the uniform quantizer has become a rhombic vector quantizer 
and the data noise signal has a modified vector form as 
just described. 

The particular case h = 0,.. where data is transmitted only 
in the parity of the LSB of the audio word in 2 channels, 
simply uses the parity signal itself at the LSB level as a 
"data noise signal" in one of the two channels in the 
encoding process - it does not matter which of the two 
channels is chosen. With subtract ively dithered playback, 
it turns out that the use of properly designed stereo 
parity coding of data, using a rhombic vector quantizer in 
the encoding process, gives a total noise level 1 dB lower 
than would the process of coding the data into the LSBs of 
the words of just one of the two audio channels. Thus 
stereo parity coding at low bit rates not only ensures 
audio left /right symmetry for added noise, but gives a 
significant noise level advantage. 

6,4 Generalized stereo parity coding 

There are various generalizations of the particular stereo 
parity coding case just des""Cribed. We outline these 
briefly to show the applicability of these ideas to other 
cases . 

A first generalization is that the same process may be 
applied to other audio wordlengths besides the 16 bit 
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wordlength of CD, for exan,ple to the 10 bit wordlengths of 
NICAM encoded digital signals or to the 20 bit or 24bit 
wordlengths used in some professional audio applications 
when it is desired to hide data in the audio words. For 
5 example, in ref . [3J , the inventors described a proposal to 
add data at the 24th bit in studio operations on signals to 
detect whether or not they had been- modified, and the data- 
encoding techniques of this invention can be used in that 
application to minimize the audibility of the modification 
10 of the signal proposed there. 

The second generalization is that one can also apply stereo 
parity coding to the case where one replaces the 2'-level 
data in the last h bits by an M-level case for any integer 
15 M > 1. In this case, data is coded into the residue of 
the audio words of the two channels after division by M 
and the "stereo parity" data channel is coded into the 
Boolean sum of the binary LSB in the two channels of the 
integer parts of the audio words divided by M. This case 
is handled identically to that in the previous sub-section 
6.3 except that 2 is replaced throughout by M, and the 
phrase "last h bits- is replaced by "residue modulo M") . 

A third generalization instead considers n channels rather 
25 than two. As before, this uses a rhombic quantizer in the 
encoding process for STEP = m LSBs, but now the n- 
dimensional rhombic quantizer described in (5-3) to (5-5) 
above, and a vector data noise signal comprising the n M- 
level data noise signals generated for the residue modulo 
M data conveyed in each of the n audio channels, to just 
one of which at each instant is added or subtracted uSTEP, 
where u is the parity (i.e. Boolean sum) of the binary LSB 
in the n channels of the integer parts of. the audio words 
divided by M. Other than replacing the ordinary uniform 
quantizers with STEP step size by a rhombic quantizer and 
using the modified data noise signal, the descriptions 
given earlier for coding data still apply to this case. 
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Note that the choice of which channel of the vector data 
noise signal to add or subtract uSTEP, and the choice of 
whether to add or subtract, can be made freely, and thatf 
this choice can be made adaptively instant by instant to 
minimize data noise energy if desired, e.g. by making that 
choice which minimizes the maximum of the data noise 
si'gnals in the n channels' at each instanr. This choice" is- 
(a discrete approximation to) that described in (5-7) for 
uniform PDF vector dither over a rhombic quantizer cell. 

6.5 Low bit-rate case 

If one has n transmitted channels of audio, then the parity 
of their LSBs can be used to transmit a l bit per n- 
channel -sample data channel, with remarkable little loss of 
S/N, especially in the case that full subtractive dithering 
IS used at the LSB level. One might expect a loss of S/N 
of 6.02/n dB because the loss is shared among n channels, 
but for n > 2, one gets a smaller loss, typically between 
0.3 and 0.4 dB better, because of the fact noted in section 
5 that rhombic vector quantization has a better S/N than 
independent channel quantization for a given density of 
quantization points in the quantization grid. For n = 6, 
a 1 bit per n-channel-sample subtractively dithered buried 
data channel causes a S/N degradation of under 0.6 dB 
compared with a properly dithered case with no buried data 
channel . 



30 



35 



Exactly the same techniques can be used to convey data via 
q successive samples of a monophonic signal, for example by 
coding into the parity of the LSBs of each successive block 
of q samples, as described in. section 2.3. What has been 
shown is that by using the parity signal as a subtractive 
dither for any one sample with a q-dimensional ■ rhombic 
quantizer, plus normal triangular additive or subtractive 
dither, that this fractional rate channel can be coded with 
a very small loss of S/N (e.g. 0.6dB for a block length q 
- €) , and yet with no nonlinear distortion or modulation 

SUBSTITUTE SHEET (RULE 26J 



wo 94/18762 PCT/GB94/00297 

51 

noise in either nonsubtractive or subtract ive reproduction. 



This kind of efficient low bit-rate culling of data 
5 capacity could be used, for example, with successive 
samples within individual sub-band channels of a sub-band 
data' compression system,- Its application is not confined 
to audio; culling say 1 bit per 6 10 -bit video samples in 
a digital video recorder with a video data rate of 200 
10 Megabits per second would give a data rate typically enough 
for 4 16 -bit audio channels or a consumer- grade additional 
data-reduced video signal while losing only 0 . 6 dB in video 
S/N in the original video channel. 
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CLAIMS 

1. A method of encoding digital data within digital 
words representing signal waveforms, including the step of 

5 modifying least significant digits of said digital words 
representing signal waveforms in dependence upon said 
digital data, 

characterised by pseudo- randomising said digital data 
thereby forming data noise words having levels small 
10 relative to those of said waveform words, 

subtracting the pseudo- randomised data words from said 
waveform words thereby producing a dithered waveform word, 
and 

quantizing said dithered waveform word and adding said 
15 data noise word to said quantized word thereby forming an 
output of reduced noise carrying information representing 
digital data in the least significant digits thereof. 

2. A method according to Claim 1 further comprising 
applying noise shaping around said step of quantizing, 

20 thereby modifying the spectrum of the difference between 
output and input waveform signals. 

3. A method according to Claim 1 or 2 , in which said 
digital data is pseudo- randomised with a probability 
distribution function such that the difference between the 

2 5 output and input waveform signals has the form of a noise 

signal substantially free of non-linear distortion products 
related to the input waveform signal. 

4. A method according to Claim 3, in which the 
probability distribution function is such that the 

3 0 difference between the output and the input waveform is 

also substantially free of variations in statistics 
dependent on the input waveform. 

5. A method according to Claim 3 or 4 , in which, the 
probability distribution function is such that the 

35 difference between the output and input waveform signals is 
also substantially free of variations in statistics 
dependent on the encoded data. 
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A method according to any one of claims l to 5 in 
which said digital words represent audio signal waveforms, 
and said digital data comprises compressed data 
representing additional audio channels. 
'7. A method according to any one of claims 1 to 5 in 

Which said digital words represent audio signal waveforms, 
and- said digital data comprises • signals f or ^ use in- 
controlling the gain of said audio signals when reproduced 

8. A method according to any of claims l to 5, in which 
saxd digital words represent audio signal waveforms, and 
saxd digital data comprises one or more continuous control 
signals continuous in time. 

9. A method according to claim 8, wherein said 
contxnuous control signals are used to modify the 
reproduced parameters of the output audio waveform signal 
as a function of time. 

10. A method according to claim 9, wherein said 
continuous control signal is used to modify the gain of the 
reproduced audio waveform signal as a function of time 

11. A method according to any of claims 8 to 10, wherein 
said continuous control signals are MIDI control signals 

12. A method according to claim 8 or 11, wherein said 
continuous control signals are used to control sound 
production and control devices. 

13. A method according to claim 12, wherein said 
continuous control signals are used to control MIDI- 
controlled musical instruments or MIDI -control led effects 
devices . 

14. An encoder for encoding data within digital words 
representing signal waveforms, comprising 

means for receiving input digital waveform words 

representing input waveform signals, 

means for receiving input digital data, 

means for outputting output digital waveform words 

representing said waveform signal and incorporating said 

digital data, 
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means for pseudo-randomising said data and for 
forming a data noise signal having a level or range of 
levels small relative to that of the waveform words, 

means for subtracting said data noise signal from 
said digital waveform words thereby producing dithered 
waveform words, 

means- for uniformly quantizing ■ said dithered" 
waveform words, and 

means for adding said data noise signal to the output of 
said means for uniformly quantizing thereby producing 
output digital words representing said signal waveforms and 
carrying said digital data in the least significant digits 
thereof . 

15. An encoder according, to Claim 14, further comprising 
15 noise shaping means connected around said means for 

quantizing effective to modify the spectrum of the 
difference between output and input waveform signals in a 
desired predetermined manner. 

16. An encoder according to Claim 14 or 15, in which the 
means for uniformly quantizing said dithered waveform words 
comprise a uniform vector quantizer, as herein defined. 

17. A method according to any one of claims 1 to 5, in 
which said digital words represent audio signal waveforms 
over a predetermined bandwidth and said digital data 

25 represents said audio signal in the frequency range 
extending at least partly beyond said predetermined 
bandwidth represented by the digital waveform word. 

18. A method of encoding data within digital words 
representing signal waveforms, including the steps of 
quantizing said signal waveform word and modifying least 
significant digits of said digital words representing 
signal waveforms in dependence upon said digital data, 

characterised by applying a reversible pseudo-random 
function to said data prior to -modifying said signal 
waveform word and applying noise-shaping to said quantized 
and modified signal waveform word thereby modifying the 
spectrum of the difference between the input signal 
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wavefozTH word and the output modified and quantized signal 
waveform word. 

19. A method of encoding data within digital words 
representing sigi^al waveforms, including the steps of 
quantizing said signal waveform word and modifying least 
significant digits of said digital words representing 
signal waveforms in dependence upon said digital data 

characterised by applying noise-shaping to said 
quantized and modified signal waveform word thereby 
modifying the spectrum of the difference between the input 
signal waveform word and the output modified and quantized 
signal waveform word. 

20. A method according to claims 18 or 19. in which in 
the step of modifying the signal waveform word data words 
small in level relative tb the waveform words are added to 
the signal waveform words subsequent to the quantization of 
said signal waveform words. 

21. A method of decoding digital data encoded within 
digital words representing signal waveforms by a method 
according to any one of claims i to 13, le to 20. 

comprising 

receiving said digital waveform words, separating 
least significant digits from said digital waveform words, 
applying a function effective as the inverse of the pseudo- 
randomising function used in encoding the digital data to 
the least significant digits thereby recovering the data, 
and outputting the data. 

22. An encoder for encoding data within digital words 
representing signal waveforms. including means for 
quantizing said signal waveform words and means for 
modifying the least significant digits thereof in 
dependence upon said digital data, 

characterised by means for applying a reversible -pseudo., 
random function to said data prior to modifying said signal 
waveform word and means for applying noise-shaping to said 
quantized and modified signal waveform word thereby 
modifying the spectrum of the difference between the input 
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signal waveform word and the output modified and quantized 
signal waveform word. 

23, An encoder for encoding data within digital words 
representing signal waveforms, including means for 
5 quantizing said signal waveform words and means for 
modifying the least significant digits thereof in 
dependence upon said digital data, 

characterised by means for applying noise-shaping to said 
quantized and modified signal waveform word thereby 
10 modifying the spectrum of the difference between the input 
signal waveform word and the output modified and quantized 
signal waveform word. 

24. An . encoder according to the claim 22 or 23, in which 
the means for modifying are arranged to add data words 

15 small in level relative to the waveform words to the signal 
waveform words subsequent to the quantization of said 
signal waveform words. 

25. A method or apparatus according to any one of claims 
18 to 20, or 22 to 24, in which said digital words 
represent audio signal waveforms, and said digital data 
comprises compressed data representing additional audio 
channels. 

26. A method or apparatus according to any one of claims 
18 to 20, or 22 to 25, in which said digital words 
represent audio signal waveforms, and said digital data 
comprises signals for use in controlling the gain of said 
audio signals when reproduced. 

27. A method or apparatus according to any of claims 18 
to 20, or 22 to 26, in which said digital words represent 
audio signal waveforms, and said digital data comprises one 
or more continuous control signals continuous in time. 

28. A method or apparatus according to claim 27 wherein 
said continuous control signals are used to modify the 
reproduced parameters of the output audio waveform signal 

35 as a function of time. 

29. A method or apparatus according to claim 28, wherein 
said continuous control signal is used to modify the gain 
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time 
30 



30 



35 



to T^^^^ °" apparatus according to any of claims 27 

to 29 wherein said continuous control signals are MIDI 
control signals. 

31. A method or apparatus according to clain, 27 to 30 
wherexn said continuous control signals are used to control 
sound production and control devices. 

32 Ajnethod or apparatus according to clain, 31, wherein 

con^ '^^'^^-^ -Srnals are used to control MIDI- 

controlled musical instruments or MIDI-controlled effects 



33 . 



within I^^T. °' ""^'^ information 

:::p"rsin;"^" ^-^^^ -presenting signal .avefo^s 

anv '^^^ ^'^'^-^ by a method according to 

any one of claims 1 to 13, 18 to 20, or 25 to 32 

outputting the resulting signal via a recording or 
transmission channel, 

tran. .""'"'"^^ "^^^^ the recording or 

transmission channel, and 

to ciait^::''"' ^^"'^^^ "^^^^^ ^ 

outout f '"'''I? —ding to Claim 33, in which the signals 
output from the encoder are recorded on an audio CD 
35. An audio decoder for decoding a digital word 
-presenting an audio signal wavefor^n encoded by a method 
according to claim 6, or claim 25, said digital data 
encoded w.thin said digital word containing additional 
audio channels, the decoder comprising: 

means for receiving digital waveform words, 

said ^""t"^ separating least significant digits from 

said, digital waveform, words,. 

means for applying a function effective as the 
inverse of the pseudo-randomising function used in encoding 

adlifr'' ^''^T'^ -covering compressed data representing 
additional audio channels. 
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means for decompressing said recovered data, and 
means for outputting said audio signal wavefor. and 
saxd recovered additional audio channels for reprodu" Ion 
vxa a multi-channel audio reproduction system. 
36. An audio decoder for decoding a digital word 

cla.m e or 25. said digital encoded within said digftal 
word containing additional audio channels, the decoder 
comprising: i-«juer 

10 means for receiving digital waveform words. 

means for separating least significant digits from 
said digital waveform words. 

means for applying a function effective as the 
15 thrr: °' P-^^o-randomising function used in encoding 

'""''^ "covering compressed data representing 
additional audio channels. sentmg 

means for decompressing said recovered data, and 

channelT recovered additional audio 

channels for audio reproduction 

20 37. An audio decoder for decoding a digital word 
representing an audio signal waveform encoded by a method 

with "^-r..''"'" °" '^'^ -coded 

withm said digital word containing level control signals 
for controlling the dynamic range of said audio signal 
25 wavefor. when reproduced, the decoder comprising- 
means for receiving digital waveform words. 

said diTn "^^^"^^-^ l^-t: significant digits from 

said digital waveform words. 

means for applying a function effective as the 
daf^r. K P-udo-randomising function used in encoding 
data thereby recovering level control signals, and mean! 
for outputting said audio signal waveform and said level 
control., signals for reproduction, vd a an., audio- s.yst.m .. 
responsive to said level control " signals to modify the 
5 dynamics of the reproduced audio signal 

38^ An audio decoder for decoding a digital word 
representing an audio signal waveform encoded by a method 
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according to any one of clairaa 1 to 5 or 17 to 20, 22, 25 
to 32, said digital data encoded with said signal 
representing said audio signal waveform at least partially 
outside the bandwidth represented by said digital word, and 
5 means responsive to said decoded digital word and digital 
data for synthesising an audio signal of extended bandwidth 
for reproduction. 

39. An audio decoder for decoding a digital word 
representing an audio signal waveform encoded according to 
any of claims 1 to 6 or 25 to 32, said digital data encoded 
within said digital word containing additional audio 
channels, the decoder comprising: 

means for receiving digital waveform words, 
means for separating least significant digits from 
15 said digital waveform words, 

means for applying a function effective as the 
inverse of the pseudo- randomising function used in encoding 
the data thereby recovering compressed data representing 
additional audio channels, 

means for decompressing said recovered data, and 
means for outputting said recovered additional audio 
channels for audio reproduction. 

40. An audio decoder according to claim 39, 
incorporating 

means for outputting said audio signal waveforms, 

and 

means for combining said audio signal waveforms and 
said additional audio channels for audio reproduction. 

41. An audio decoder according to claim 40, wherein said 
combining means feeds an adjustable mixing means for 
altering the mix or sound balance of the reproduced audio 
signals. 

42. An audio decoder according to claim 40, wherein said ■ 
combined audio signals are recovered for directional sound 
reproduction via a multi-channel audio reproduction system. 

43. A method according to any one of claims 18 to 20. in 
which said digital words represent audio signal waveforms 
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over a predetermined bandwidth and said digital data 
represents said audio signal in the frequency range 
extending at least partly beyond said predetermined 
bandwidth represented by the digital waveform word. 
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